Combining pre-editing and post-editing to improve SMT of user- generated content

نویسندگان

  • Johanna Gerlach
  • Victoria Porro
  • Pierrette Bouillon
  • Sabine Lehmann
چکیده

The poor quality of user-generated content (UGC) found in forums hinders both readability and machine-translatability. To improve these two aspects, we have developed humanand machine-oriented pre-editing rules, which correct or reformulate this content. In this paper we present the results of a study which investigates whether pre-editing rules that improve the quality of statistical machine translation (SMT) output also have a positive impact on post-editing productivity. For this study, pre-editing rules were applied to a set of French sentences extracted from a technical forum. After SMT, the post-editing temporal effort and final quality are compared for translations of the raw source and its pre-edited version. Results obtained suggest that pre-editing speeds up post-editing and that the combination of the two processes is worthy of further investigation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content

With the development of Web 2.0, a lot of content is nowadays generated online by users. Due to its characteristics (e.g., use of jargon and abbreviations, typos, grammatical and style errors), the user-generated content poses specific challenges to machine translation. This paper presents an online platform devoted to the pre-editing of user-generated content and its post-editing, two main typ...

متن کامل

Rule-based Automatic Post-processing of SMT Output to Reduce Human Post-editing Effort

To enhance sharing of knowledge across the language barrier, the ACCEPT project focuses on improving machine translation of user-generated content by investigating preand postediting strategies. Within this context, we have developed automatic monolingual post-editing rules for French, aimed at correcting frequent errors automatically. The rules were developed using the Acrolinx IQ technology, ...

متن کامل

A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation

The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the ...

متن کامل

ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool

This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a ‘Controlled Language’ that is ‘understood’ by the computer. ProphetMT also allows users to easily attach structural in...

متن کامل

ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tools

This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a ‘Controlled Language’ that is ‘understood’ by the computer. ProphetMT also allows users to easily attach structural in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013